AITopics | regularization property

Collaborating Authors

regularization property

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Implicit Regularization of Accelerated Methods in Hilbert Spaces

Nicolò Pagliana, Lorenzo Rosasco

Neural Information Processing SystemsFeb-14-2026, 01:46:24 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, gradient descent, qualification, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Regularization properties of adversarially-trained linear regression

Neural Information Processing SystemsDec-25-2025, 01:21:48 GMT

State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against it. Formulated as a min-max problem, it searches for the best solution when the training data were corrupted by the worst-case attacks. Linear models are among the simple models where vulnerabilities can be observed and are the focus of our study. In this case, adversarial training leads to a convex optimization problem which can be formulated as the minimization of a finite sum. We provide a comparative analysis between the solution of adversarial training in linear regression and other regularization methods. Our main findings are that: (A) Adversarial training yields the minimum-norm interpolating solution in the overparameterized regime (more parameters than data), as long as the maximum disturbance radius is smaller than a threshold. And, conversely, the minimum-norm interpolator is the solution to adversarial training with a given radius.

adversarially-trained linear regression, name change, regularization property, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Implicit Regularization of Accelerated Methods in Hilbert Spaces

Nicolò Pagliana, Lorenzo Rosasco

Neural Information Processing SystemsAug-20-2025, 01:47:40 GMT

Our theoretical results are validated by numerical simulations. Our analysis is based on studying suitable polynomials induced by the accelerated dynamics and combining spectral techniques with concentration inequalities.

algorithm, gradient descent, qualification, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Deep convolutional models have been at the heart of the recent successes of deep learning in problems where the data consists of high-dimensional signals, such as image classification or speech recognition. The convolution and pooling operations in these architectures are known to be crucial for their practical success, yet our theoretical understanding of how they enable efficient learning is still limited. One key difficulty for understanding such models is the curse of dimensionality: due to the highdimensionality of the input data, it is hopeless to learn arbitrary functions from samples. For instance, classical non-parametric regression techniques for approximating Lipschitz or Sobolev functions typically require either low dimension or an order of smoothness of the target function comparable to the dimension in order to obtain good generalization (e.g., Wainwright, 2019), which is a very strong assumption when dealing with high-dimensional signals. Thus, further assumptions on the target function are needed to make the problem more tractable, in a way that makes convolutions a useful modeling tool. Various works have studied approximation benefits with models that resemble deep convolutional architectures, for instance through hierarchical models with local connectivity (Mhaskar and Poggio, 2016; Schmidt-Hieber et al., 2020), or through structured tensor decompositions (Cohen and Shashua, 2017). Nevertheless, while such function classes may provide improved statistical efficiency, it is unclear if they can be learned with computationally efficient algorithms, which makes it difficult to assess the validity of these approximation models empirically. In order to overcome the computational difficulties, we provide a different perspective based on kernel methods (e.g., Schölkopf and Smola, 2001; Wahba, 1990), which are known to be computationally tractable with well-understood statistical and approximation properties. In particular, we consider "deep" structured kernels known as convolutional kernels, which have produced good empirical performance on standard

architecture, convolutional kernel, kernel, (15 more...)

arXiv.org Machine Learning

2102.10032

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

On the Regularization Properties of Structured Dropout

Pal, Ambar, Lane, Connor, Vidal, René, Haeffele, Benjamin D.

arXiv.org Machine LearningOct-30-2019

Dropout and its extensions (eg. DropBlock and DropConnect) are popular heuristics for training neural networks, which have been shown to improve generalization performance in practice. However, a theoretical understanding of their optimization and regularization properties remains elusive. Recent work shows that in the case of single hidden-layer linear networks, Dropout is a stochastic gradient descent method for minimizing a regularized loss, and that the regularizer induces solutions that are low-rank and balanced. In this work we show that for single hidden-layer linear networks, DropBlock induces spectral k-support norm regularization, and promotes solutions that are low-rank and have factors with equal norm. We also show that the global minimizer for DropBlock can be computed in closed form, and that DropConnect is equivalent to Dropout. We then show that some of these results can be extended to a general class of Dropout-strategies, and, with some assumptions, to deep non-linear networks when Dropout is applied to the last layer. We verify our theoretical claims and assumptions experimentally with commonly used network architectures.

dropblock, dropout, objective, (15 more...)

arXiv.org Machine Learning

1910.14186

Country: North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

On Regularization Properties of Artificial Datasets for Deep Learning

Antczak, Karol

arXiv.org Machine LearningAug-19-2019

In this paper, w e have presented analogies between the regularization methods for deep learning and data augmentation process interpreted as a noise injection. It was shown that, by generating the input data from high - level features, it is possible to regularize hidden layers of the netwo rk by exploiting the ability of deep networks to learn hierarchical representations . The analysis given here is theoretical, but there already are experimental results that partially confirm these observations . A case of convolutional neural networks for stenosis detection [14] have shown that pretraining the network on artificial dataset results in reduction of test error rate on real dataset, and, thus, smaller generalization gap. An improvement of test accuracy was also observed in the case of recurrent neural networks for ECG filtering, pretrained with synthetic signals [15] . A more definitive confirmation should be expected by the comparison of models trained for the same task with dataset s created by injecting noise either into input features or high - level features of the real data.

artificial intelligence, machine learning, regularization, (16 more...)

arXiv.org Machine Learning

1908.07005

Country:

North America (0.14)
Europe > Poland (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Implicit Regularization of Accelerated Methods in Hilbert Spaces

Pagliana, Nicolò, Rosasco, Lorenzo

arXiv.org Machine LearningJun-18-2019

We study learning properties of accelerated gradient descent methods for linear least-squares in Hilbert spaces. We analyze the implicit regularization properties of Nesterov acceleration and a variant of heavy-ball in terms of corresponding learning error bounds. Our results show that acceleration can provides faster bias decay than gradient descent, but also suffers of a more unstable behavior. As a result acceleration cannot be in general expected to improve learning accuracy with respect to gradient descent, but rather to achieve the same accuracy with reduced computations. Our theoretical results are validated by numerical simulations. Our analysis is based on studying suitable polynomials induced by the accelerated dynamics and combining spectral techniques with concentration inequalities.

artificial intelligence, gradient descent, machine learning, (16 more...)

arXiv.org Machine Learning

1905.13

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.77)

Add feedback

Identifying global optimality for dictionary learning

Le, Lei, White, Martha

arXiv.org Machine LearningAug-6-2017

Learning new representations of input observations in machine learning is often tackled using a factorization of the data. For many such problems, including sparse coding and matrix completion, learning these factorizations can be difficult, in terms of efficiency and to guarantee that the solution is a global minimum. Recently, a general class of objectives have been introduced--which we term induced dictionary learning models (DLMs)--that have an induced convex form that enables global optimization. Though attractive theoretically, this induced form is impractical, particularly for large or growing datasets. In this work, we investigate the use of practical alternating minimization algorithms for induced DLMs, that ensure convergence to global optima. We characterize the stationary points of these models, and, using these insights, highlight practical choices for the objectives. We then provide theoretical and empirical evidence that alternating minimization, from a random initialization, converges to global minima for a large subclass of induced DLMs. In particular, we take advantage of the existence of the (potentially unknown) convex induced form, to identify when stationary points are global minima for the dictionary learning objective. We then provide an empirical investigation into practical optimization choices for using alternating minimization for induced DLMs, for both batch and stochastic gradient descent.

artificial intelligence, machine learning, regularizer, (16 more...)

arXiv.org Machine Learning

1604.04942

Country: North America (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback